Overview

Dataset statistics

Number of variables40
Number of observations228
Missing cells638
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory334.4 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2014" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
DEXAME has a high cardinality: 140 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
CS_GESTANT is highly correlated with PMMHigh correlation
PMM is highly correlated with CS_GESTANTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
COUFINF is highly correlated with RESULT and 9 other fieldsHigh correlation
PMM is highly correlated with DTRATA and 2 other fieldsHigh correlation
CS_RACA is highly correlated with DTRATA and 3 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 15 other fieldsHigh correlation
AT_SINTOMA is highly correlated with RESULT and 5 other fieldsHigh correlation
ID_UNIDADE is highly correlated with AT_LAMINAHigh correlation
ID_REGIONA is highly correlated with RESULT and 5 other fieldsHigh correlation
SG_UF_NOT is highly correlated with RESULT and 5 other fieldsHigh correlation
SEM_NOT is highly correlated with DTRATA and 1 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 15 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 6 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_MN_RESIHigh correlation
ID_MUNICIP is highly correlated with ID_REGIONA and 2 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with CS_RACA and 5 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 11 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 9 other fieldsHigh correlation
COPAISINF is highly correlated with PMM and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with COUFINF and 10 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 10 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with COUFINF and 11 other fieldsHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
AT_ATIVIDA is highly correlated with COUFINF and 9 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with COUFINF and 11 other fieldsHigh correlation
ID_MN_RESI is highly correlated with CS_RACA and 8 other fieldsHigh correlation
COUFINF is highly correlated with DTRATA and 8 other fieldsHigh correlation
ID_REGIONA is highly correlated with ID_PAIS and 8 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 14 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 6 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
LOC_INF is highly correlated with DTRATA and 10 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
RESULT is highly correlated with DTRATA and 12 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_REGIONA and 11 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 8 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 13 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 11 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
CLASSI_FIN is highly correlated with DTRATA and 13 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 9 other fieldsHigh correlation
DT_INVEST has 228 (100.0%) missing values Missing
PMM has 182 (79.8%) missing values Missing
DT_ENCERRA has 228 (100.0%) missing values Missing
DEXAME is uniformly distributed Uniform
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 171 (75.0%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:42:56.952565
Analysis finished2021-07-06 18:43:16.782033
Duration19.83 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.0 KiB
2
228 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters228
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2228
100.0%

Most occurring characters

ValueCountFrequency (%)
2228
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number228
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2228
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common228
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2228
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2228
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size17.0 KiB
B54
228 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters684
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54228
100.0%

Most occurring characters

ValueCountFrequency (%)
B228
33.3%
5228
33.3%
4228
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number456
66.7%
Uppercase Letter228
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5228
50.0%
4228
50.0%
Uppercase Letter
ValueCountFrequency (%)
B228
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common456
66.7%
Latin228
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5228
50.0%
4228
50.0%
Latin
ValueCountFrequency (%)
B228
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B228
33.3%
5228
33.3%
4228
33.3%
Distinct141
Distinct (%)61.8%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
Minimum2014-01-03 00:00:00
Maximum2014-12-29 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)22.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201422.5789
Minimum201401
Maximum201453
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum201401
5-th percentile201403
Q1201410.75
median201419
Q3201434.25
95-th percentile201450
Maximum201453
Range52
Interquartile range (IQR)23.5

Descriptive statistics

Standard deviation14.69567633
Coefficient of variation (CV)7.295942892 × 10-5
Kurtosis-0.8633461644
Mean201422.5789
Median Absolute Deviation (MAD)11
Skewness0.4892538559
Sum45924348
Variance215.9629029
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20140710
 
4.4%
20141910
 
4.4%
2014189
 
3.9%
2014089
 
3.9%
2014208
 
3.5%
2014127
 
3.1%
2014177
 
3.1%
2014157
 
3.1%
2014377
 
3.1%
2014037
 
3.1%
Other values (42)147
64.5%
ValueCountFrequency (%)
2014013
 
1.3%
2014026
2.6%
2014037
3.1%
2014045
2.2%
2014055
2.2%
2014063
 
1.3%
20140710
4.4%
2014089
3.9%
2014096
2.6%
2014103
 
1.3%
ValueCountFrequency (%)
2014532
 
0.9%
2014523
1.3%
2014514
1.8%
2014504
1.8%
2014493
1.3%
2014486
2.6%
2014472
 
0.9%
2014461
 
0.4%
2014444
1.8%
2014433
1.3%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.7 KiB
2014
228 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters912
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2014
2nd row2014
3rd row2014
4th row2014
5th row2014

Common Values

ValueCountFrequency (%)
2014228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2014228
100.0%

Most occurring characters

ValueCountFrequency (%)
2228
25.0%
0228
25.0%
1228
25.0%
4228
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number912
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2228
25.0%
0228
25.0%
1228
25.0%
4228
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common912
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2228
25.0%
0228
25.0%
1228
25.0%
4228
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII912
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2228
25.0%
0228
25.0%
1228
25.0%
4228
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
33
225 
29
 
1
35
 
1
41
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters456
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.3%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33225
98.7%
291
 
0.4%
351
 
0.4%
411
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33225
98.7%
411
 
0.4%
351
 
0.4%
291
 
0.4%

Most occurring characters

ValueCountFrequency (%)
3451
98.9%
51
 
0.2%
41
 
0.2%
11
 
0.2%
21
 
0.2%
91
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number456
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3451
98.9%
51
 
0.2%
41
 
0.2%
11
 
0.2%
21
 
0.2%
91
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common456
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3451
98.9%
51
 
0.2%
41
 
0.2%
11
 
0.2%
21
 
0.2%
91
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII456
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3451
98.9%
51
 
0.2%
41
 
0.2%
11
 
0.2%
21
 
0.2%
91
 
0.2%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct23
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330721.807
Minimum292740
Maximum410830
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum292740
5-th percentile330214
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum410830
Range118090
Interquartile range (IQR)0

Descriptive statistics

Standard deviation6109.309134
Coefficient of variation (CV)0.01847265286
Kurtosis138.5700878
Mean330721.807
Median Absolute Deviation (MAD)0
Skewness9.231550952
Sum75404572
Variance37323658.09
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
330455187
82.0%
33024010
 
4.4%
3301005
 
2.2%
3306303
 
1.3%
3300802
 
0.9%
3304522
 
0.9%
3304602
 
0.9%
3303302
 
0.9%
3302001
 
0.4%
3302501
 
0.4%
Other values (13)13
 
5.7%
ValueCountFrequency (%)
2927401
 
0.4%
3300101
 
0.4%
3300231
 
0.4%
3300701
 
0.4%
3300802
 
0.9%
3301005
2.2%
3302001
 
0.4%
33024010
4.4%
3302501
 
0.4%
3303201
 
0.4%
ValueCountFrequency (%)
4108301
 
0.4%
3550301
 
0.4%
3306303
 
1.3%
3306001
 
0.4%
3304901
 
0.4%
3304602
 
0.9%
330455187
82.0%
3304522
 
0.9%
3304201
 
0.4%
3303801
 
0.4%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size13.8 KiB
225 
1331
 
1
1363
 
1
1380
 
1

Length

Max length4
Median length0
Mean length0.05263157895
Min length0

Characters and Unicode

Total characters12
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.3%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
225
98.7%
13311
 
0.4%
13631
 
0.4%
13801
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
13311
33.3%
13631
33.3%
13801
33.3%

Most occurring characters

ValueCountFrequency (%)
35
41.7%
14
33.3%
61
 
8.3%
81
 
8.3%
01
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number12
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
35
41.7%
14
33.3%
61
 
8.3%
81
 
8.3%
01
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Common12
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
35
41.7%
14
33.3%
61
 
8.3%
81
 
8.3%
01
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII12
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
35
41.7%
14
33.3%
61
 
8.3%
81
 
8.3%
01
 
8.3%

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct61
Distinct (%)26.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3529412.575
Minimum12521
Maximum6870066
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum12521
5-th percentile2273408.5
Q12288338
median2343918
Q35476321
95-th percentile6160689.65
Maximum6870066
Range6857545
Interquartile range (IQR)3187983

Descriptive statistics

Standard deviation1602471.123
Coefficient of variation (CV)0.4540333807
Kurtosis-1.111094521
Mean3529412.575
Median Absolute Deviation (MAD)358650.5
Skewness0.6282308384
Sum804706067
Variance2.5679137 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
228833879
34.6%
547632153
23.2%
300599215
 
6.6%
227653410
 
4.4%
33754714
 
1.8%
30349843
 
1.3%
60439413
 
1.3%
22699882
 
0.9%
22685072
 
0.9%
22875792
 
0.9%
Other values (51)55
24.1%
ValueCountFrequency (%)
125211
0.4%
251351
0.4%
20892381
0.4%
22685072
0.9%
22697831
0.4%
22698051
0.4%
22699882
0.9%
22702341
0.4%
22705441
0.4%
22733491
0.4%
ValueCountFrequency (%)
68700661
0.4%
68583171
0.4%
67534691
0.4%
66815731
0.4%
66460341
0.4%
66454021
0.4%
66351481
0.4%
66299541
0.4%
66079501
0.4%
65595651
0.4%
Distinct158
Distinct (%)69.3%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
Minimum2013-11-14 00:00:00
Maximum2014-12-26 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201419.2149
Minimum201346
Maximum201452
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum201346
5-th percentile201401
Q1201408
median201418
Q3201432.25
95-th percentile201449.65
Maximum201452
Range106
Interquartile range (IQR)24.25

Descriptive statistics

Standard deviation19.67382363
Coefficient of variation (CV)9.767600194 × 10-5
Kurtosis3.408497013
Mean201419.2149
Median Absolute Deviation (MAD)11
Skewness-1.143194716
Sum45923581
Variance387.0593361
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20141413
 
5.7%
20141812
 
5.3%
20140211
 
4.8%
20140711
 
4.8%
20142010
 
4.4%
20140610
 
4.4%
2014117
 
3.1%
2014247
 
3.1%
2014016
 
2.6%
2014516
 
2.6%
Other values (47)135
59.2%
ValueCountFrequency (%)
2013461
 
0.4%
2013481
 
0.4%
2013492
 
0.9%
2013511
 
0.4%
2013523
 
1.3%
2014016
2.6%
20140211
4.8%
2014035
2.2%
2014043
 
1.3%
2014052
 
0.9%
ValueCountFrequency (%)
2014523
1.3%
2014516
2.6%
2014503
1.3%
2014492
 
0.9%
2014482
 
0.9%
2014474
1.8%
2014461
 
0.4%
2014453
1.3%
2014442
 
0.9%
2014432
 
0.9%
Distinct222
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
Minimum1934-07-07 00:00:00
Maximum2014-04-30 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

Distinct65
Distinct (%)28.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4028.789474
Minimum2000
Maximum4079
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum2000
5-th percentile4012.35
Q14029
median4037
Q34047
95-th percentile4062
Maximum4079
Range2079
Interquartile range (IQR)18

Descriptive statistics

Standard deviation135.7695945
Coefficient of variation (CV)0.03369984841
Kurtosis222.4839687
Mean4028.789474
Median Absolute Deviation (MAD)9
Skewness-14.82608926
Sum918564
Variance18433.3828
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
403015
 
6.6%
404111
 
4.8%
403210
 
4.4%
40449
 
3.9%
40378
 
3.5%
40457
 
3.1%
40467
 
3.1%
40267
 
3.1%
40337
 
3.1%
40356
 
2.6%
Other values (55)141
61.8%
ValueCountFrequency (%)
20001
 
0.4%
40013
1.3%
40021
 
0.4%
40041
 
0.4%
40051
 
0.4%
40081
 
0.4%
40091
 
0.4%
40112
0.9%
40121
 
0.4%
40134
1.8%
ValueCountFrequency (%)
40791
 
0.4%
40731
 
0.4%
40711
 
0.4%
40681
 
0.4%
40661
 
0.4%
40651
 
0.4%
40644
1.8%
40631
 
0.4%
40622
 
0.9%
40615
2.2%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
M
156 
F
72 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters228
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M156
68.4%
F72
31.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m156
68.4%
f72
31.6%

Most occurring characters

ValueCountFrequency (%)
M156
68.4%
F72
31.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter228
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M156
68.4%
F72
31.6%

Most occurring scripts

ValueCountFrequency (%)
Latin228
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M156
68.4%
F72
31.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M156
68.4%
F72
31.6%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size13.0 KiB
6
166 
5
54 
9
 
8

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters228
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6
2nd row5
3rd row6
4th row5
5th row6

Common Values

ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

Most occurring characters

ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number228
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
Common228
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6166
72.8%
554
 
23.7%
98
 
3.5%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
1
117 
9
36 
4
35 
2
34 
 
6

Length

Max length1
Median length1
Mean length0.9736842105
Min length0

Characters and Unicode

Total characters222
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1117
51.3%
936
 
15.8%
435
 
15.4%
234
 
14.9%
6
 
2.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1117
52.7%
936
 
16.2%
435
 
15.8%
234
 
15.3%

Most occurring characters

ValueCountFrequency (%)
1117
52.7%
936
 
16.2%
435
 
15.8%
234
 
15.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number222
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1117
52.7%
936
 
16.2%
435
 
15.8%
234
 
15.3%

Most occurring scripts

ValueCountFrequency (%)
Common222
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1117
52.7%
936
 
16.2%
435
 
15.8%
234
 
15.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1117
52.7%
936
 
16.2%
435
 
15.8%
234
 
15.3%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
08
89 
09
35 
06
26 
24 
07
15 
Other values (7)
39 

Length

Max length2
Median length2
Mean length1.789473684
Min length0

Characters and Unicode

Total characters408
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row09
2nd row07
3rd row08
4th row08
5th row

Common Values

ValueCountFrequency (%)
0889
39.0%
0935
 
15.4%
0626
 
11.4%
24
 
10.5%
0715
 
6.6%
059
 
3.9%
107
 
3.1%
017
 
3.1%
047
 
3.1%
035
 
2.2%
Other values (2)4
 
1.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
0889
43.6%
0935
 
17.2%
0626
 
12.7%
0715
 
7.4%
059
 
4.4%
107
 
3.4%
017
 
3.4%
047
 
3.4%
035
 
2.5%
023
 
1.5%

Most occurring characters

ValueCountFrequency (%)
0205
50.2%
889
21.8%
935
 
8.6%
626
 
6.4%
715
 
3.7%
114
 
3.4%
59
 
2.2%
47
 
1.7%
35
 
1.2%
23
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0205
50.2%
889
21.8%
935
 
8.6%
626
 
6.4%
715
 
3.7%
114
 
3.4%
59
 
2.2%
47
 
1.7%
35
 
1.2%
23
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common408
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0205
50.2%
889
21.8%
935
 
8.6%
626
 
6.4%
715
 
3.7%
114
 
3.4%
59
 
2.2%
47
 
1.7%
35
 
1.2%
23
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII408
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0205
50.2%
889
21.8%
935
 
8.6%
626
 
6.4%
715
 
3.7%
114
 
3.4%
59
 
2.2%
47
 
1.7%
35
 
1.2%
23
 
0.7%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
33
228 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters456
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33228
100.0%

Most occurring characters

ValueCountFrequency (%)
3456
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number456
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3456
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common456
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3456
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII456
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3456
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct29
Distinct (%)12.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330407.7675
Minimum330010
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum330010
5-th percentile330100
Q1330455
median330455
Q3330455
95-th percentile330490
Maximum330630
Range620
Interquartile range (IQR)0

Descriptive statistics

Standard deviation115.5215463
Coefficient of variation (CV)0.0003496332643
Kurtosis2.722874274
Mean330407.7675
Median Absolute Deviation (MAD)0
Skewness-1.793181349
Sum75332971
Variance13345.22766
MonotonicityNot monotonic
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
330455158
69.3%
33049010
 
4.4%
3302409
 
3.9%
3303506
 
2.6%
3306304
 
1.8%
3301004
 
1.8%
3303304
 
1.8%
3304523
 
1.3%
3301703
 
1.3%
3300802
 
0.9%
Other values (19)25
 
11.0%
ValueCountFrequency (%)
3300101
 
0.4%
3300151
 
0.4%
3300231
 
0.4%
3300702
0.9%
3300802
0.9%
3300901
 
0.4%
3300931
 
0.4%
3300951
 
0.4%
3301004
1.8%
3301301
 
0.4%
ValueCountFrequency (%)
3306304
 
1.8%
3305101
 
0.4%
33049010
 
4.4%
3304602
 
0.9%
330455158
69.3%
3304523
 
1.3%
3304201
 
0.4%
3303801
 
0.4%
3303506
 
2.6%
3303401
 
0.4%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.7 KiB
228 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.0 KiB
1
228 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters228
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1228
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1228
100.0%

Most occurring characters

ValueCountFrequency (%)
1228
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number228
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1228
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common228
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1228
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1228
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing228
Missing (%)100.0%
Memory size1.9 KiB

ID_OCUPA_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)17.5%
Missing0
Missing (%)0.0%
Memory size13.9 KiB
144 
999991
17 
241005
 
10
999993
 
6
214205
 
5
Other values (35)
46 

Length

Max length6
Median length0
Mean length2.210526316
Min length0

Characters and Unicode

Total characters504
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)11.8%

Sample

1st row
2nd row
3rd row214205
4th row
5th row

Common Values

ValueCountFrequency (%)
144
63.2%
99999117
 
7.5%
24100510
 
4.4%
9999936
 
2.6%
2142055
 
2.2%
2521054
 
1.8%
2631103
 
1.3%
2235052
 
0.9%
9999922
 
0.9%
2516052
 
0.9%
Other values (30)33
 
14.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
99999117
20.2%
24100510
 
11.9%
9999936
 
7.1%
2142055
 
6.0%
2521054
 
4.8%
2631103
 
3.6%
2516052
 
2.4%
2211052
 
2.4%
1414102
 
2.4%
2235052
 
2.4%
Other values (29)31
36.9%

Most occurring characters

ValueCountFrequency (%)
9130
25.8%
194
18.7%
278
15.5%
063
12.5%
560
11.9%
434
 
6.7%
322
 
4.4%
710
 
2.0%
69
 
1.8%
84
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number504
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9130
25.8%
194
18.7%
278
15.5%
063
12.5%
560
11.9%
434
 
6.7%
322
 
4.4%
710
 
2.0%
69
 
1.8%
84
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common504
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9130
25.8%
194
18.7%
278
15.5%
063
12.5%
560
11.9%
434
 
6.7%
322
 
4.4%
710
 
2.0%
69
 
1.8%
84
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII504
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9130
25.8%
194
18.7%
278
15.5%
063
12.5%
560
11.9%
434
 
6.7%
322
 
4.4%
710
 
2.0%
69
 
1.8%
84
 
0.8%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
2
169 
1
57 
 
2

Length

Max length1
Median length1
Mean length0.9912280702
Min length0

Characters and Unicode

Total characters226
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row2
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2169
74.1%
157
 
25.0%
2
 
0.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2169
74.8%
157
 
25.2%

Most occurring characters

ValueCountFrequency (%)
2169
74.8%
157
 
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number226
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2169
74.8%
157
 
25.2%

Most occurring scripts

ValueCountFrequency (%)
Common226
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2169
74.8%
157
 
25.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII226
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2169
74.8%
157
 
25.2%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
11
114 
10
69 
99
18 
4
13 
3
 
5
Other values (5)
 
9

Length

Max length2
Median length2
Mean length1.872807018
Min length0

Characters and Unicode

Total characters427
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row
2nd row3
3rd row11
4th row11
5th row11

Common Values

ValueCountFrequency (%)
11114
50.0%
1069
30.3%
9918
 
7.9%
413
 
5.7%
35
 
2.2%
12
 
0.9%
62
 
0.9%
92
 
0.9%
2
 
0.9%
51
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
11114
50.4%
1069
30.5%
9918
 
8.0%
413
 
5.8%
35
 
2.2%
12
 
0.9%
62
 
0.9%
92
 
0.9%
51
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1299
70.0%
069
 
16.2%
938
 
8.9%
413
 
3.0%
35
 
1.2%
62
 
0.5%
51
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number427
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1299
70.0%
069
 
16.2%
938
 
8.9%
413
 
3.0%
35
 
1.2%
62
 
0.5%
51
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common427
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1299
70.0%
069
 
16.2%
938
 
8.9%
413
 
3.0%
35
 
1.2%
62
 
0.5%
51
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII427
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1299
70.0%
069
 
16.2%
938
 
8.9%
413
 
3.0%
35
 
1.2%
62
 
0.5%
51
 
0.2%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
1
117 
2
99 
3
 
10
 
2

Length

Max length1
Median length1
Mean length0.9912280702
Min length0

Characters and Unicode

Total characters226
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1117
51.3%
299
43.4%
310
 
4.4%
2
 
0.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1117
51.8%
299
43.8%
310
 
4.4%

Most occurring characters

ValueCountFrequency (%)
1117
51.8%
299
43.8%
310
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number226
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1117
51.8%
299
43.8%
310
 
4.4%

Most occurring scripts

ValueCountFrequency (%)
Common226
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1117
51.8%
299
43.8%
310
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII226
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1117
51.8%
299
43.8%
310
 
4.4%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
1
213 
2
 
13
 
2

Length

Max length1
Median length1
Mean length0.9912280702
Min length0

Characters and Unicode

Total characters226
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1213
93.4%
213
 
5.7%
2
 
0.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1213
94.2%
213
 
5.8%

Most occurring characters

ValueCountFrequency (%)
1213
94.2%
213
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number226
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1213
94.2%
213
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Common226
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1213
94.2%
213
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII226
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1213
94.2%
213
 
5.8%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size14.0 KiB
171 
2
51 
1
 
6

Length

Max length1
Median length0
Mean length0.25
Min length0

Characters and Unicode

Total characters57
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row2
4th row
5th row

Common Values

ValueCountFrequency (%)
171
75.0%
251
 
22.4%
16
 
2.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
251
89.5%
16
 
10.5%

Most occurring characters

ValueCountFrequency (%)
251
89.5%
16
 
10.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number57
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
251
89.5%
16
 
10.5%

Most occurring scripts

ValueCountFrequency (%)
Common57
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
251
89.5%
16
 
10.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII57
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
251
89.5%
16
 
10.5%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size14.0 KiB
197 
AM
 
8
RJ
 
8
RO
 
4
AP
 
3
Other values (5)
 
8

Length

Max length2
Median length0
Mean length0.2719298246
Min length0

Characters and Unicode

Total characters62
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.9%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
197
86.4%
AM8
 
3.5%
RJ8
 
3.5%
RO4
 
1.8%
AP3
 
1.3%
RR2
 
0.9%
AC2
 
0.9%
PA2
 
0.9%
TO1
 
0.4%
MA1
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
am8
25.8%
rj8
25.8%
ro4
12.9%
ap3
 
9.7%
rr2
 
6.5%
ac2
 
6.5%
pa2
 
6.5%
to1
 
3.2%
ma1
 
3.2%

Most occurring characters

ValueCountFrequency (%)
R16
25.8%
A16
25.8%
M9
14.5%
J8
12.9%
O5
 
8.1%
P5
 
8.1%
C2
 
3.2%
T1
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter62
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R16
25.8%
A16
25.8%
M9
14.5%
J8
12.9%
O5
 
8.1%
P5
 
8.1%
C2
 
3.2%
T1
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin62
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R16
25.8%
A16
25.8%
M9
14.5%
J8
12.9%
O5
 
8.1%
P5
 
8.1%
C2
 
3.2%
T1
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII62
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R16
25.8%
A16
25.8%
M9
14.5%
J8
12.9%
O5
 
8.1%
P5
 
8.1%
C2
 
3.2%
T1
 
1.6%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct12
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.096491228
Minimum0
Maximum177
Zeros171
Zeros (%)75.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.25
95-th percentile31
Maximum177
Range177
Interquartile range (IQR)0.25

Descriptive statistics

Standard deviation29.61930829
Coefficient of variation (CV)3.658289432
Kurtosis21.47358046
Mean8.096491228
Median Absolute Deviation (MAD)0
Skewness4.584692932
Sum1846
Variance877.3034238
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
0171
75.0%
131
 
13.6%
3113
 
5.7%
1774
 
1.8%
222
 
0.9%
1591
 
0.4%
1131
 
0.4%
1111
 
0.4%
1091
 
0.4%
841
 
0.4%
Other values (2)2
 
0.9%
ValueCountFrequency (%)
0171
75.0%
131
 
13.6%
71
 
0.4%
222
 
0.9%
3113
 
5.7%
771
 
0.4%
841
 
0.4%
1091
 
0.4%
1111
 
0.4%
1131
 
0.4%
ValueCountFrequency (%)
1774
 
1.8%
1591
 
0.4%
1131
 
0.4%
1111
 
0.4%
1091
 
0.4%
841
 
0.4%
771
 
0.4%
3113
5.7%
222
 
0.9%
71
 
0.4%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size13.8 KiB
197 
130260
 
4
330080
 
3
110020
 
3
120020
 
2
Other values (15)
 
19

Length

Max length6
Median length0
Mean length0.8157894737
Min length0

Characters and Unicode

Total characters186
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)4.8%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
197
86.4%
1302604
 
1.8%
3300803
 
1.3%
1100203
 
1.3%
1200202
 
0.9%
1303802
 
0.9%
1600302
 
0.9%
3304602
 
0.9%
3302402
 
0.9%
2111301
 
0.4%
Other values (10)10
 
4.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1302604
12.9%
1100203
 
9.7%
3300803
 
9.7%
1600302
 
6.5%
1200202
 
6.5%
1303802
 
6.5%
3302402
 
6.5%
3304602
 
6.5%
1100021
 
3.2%
1721001
 
3.2%
Other values (9)9
29.0%

Most occurring characters

ValueCountFrequency (%)
076
40.9%
132
17.2%
330
 
16.1%
218
 
9.7%
610
 
5.4%
48
 
4.3%
86
 
3.2%
73
 
1.6%
52
 
1.1%
91
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number186
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
076
40.9%
132
17.2%
330
 
16.1%
218
 
9.7%
610
 
5.4%
48
 
4.3%
86
 
3.2%
73
 
1.6%
52
 
1.1%
91
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common186
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
076
40.9%
132
17.2%
330
 
16.1%
218
 
9.7%
610
 
5.4%
48
 
4.3%
86
 
3.2%
73
 
1.6%
52
 
1.1%
91
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII186
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
076
40.9%
132
17.2%
330
 
16.1%
218
 
9.7%
610
 
5.4%
48
 
4.3%
86
 
3.2%
73
 
1.6%
52
 
1.1%
91
 
0.5%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size13.7 KiB
216 
MACH
 
1
SERR
 
1
ACRE
 
1
LUAN
 
1
Other values (8)
 
8

Length

Max length4
Median length0
Mean length0.2061403509
Min length0

Characters and Unicode

Total characters47
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)5.3%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
216
94.7%
MACH1
 
0.4%
SERR1
 
0.4%
ACRE1
 
0.4%
LUAN1
 
0.4%
AFRI1
 
0.4%
SANA1
 
0.4%
RORA1
 
0.4%
CEU1
 
0.4%
MANA1
 
0.4%
Other values (3)3
 
1.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
ango1
8.3%
ceu1
8.3%
sana1
8.3%
maca1
8.3%
rora1
8.3%
mana1
8.3%
afri1
8.3%
acre1
8.3%
luan1
8.3%
serr1
8.3%
Other values (2)2
16.7%

Most occurring characters

ValueCountFrequency (%)
A13
27.7%
R6
12.8%
N4
 
8.5%
C4
 
8.5%
E4
 
8.5%
M3
 
6.4%
U2
 
4.3%
S2
 
4.3%
O2
 
4.3%
F2
 
4.3%
Other values (5)5
 
10.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter47
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A13
27.7%
R6
12.8%
N4
 
8.5%
C4
 
8.5%
E4
 
8.5%
M3
 
6.4%
U2
 
4.3%
S2
 
4.3%
O2
 
4.3%
F2
 
4.3%
Other values (5)5
 
10.6%

Most occurring scripts

ValueCountFrequency (%)
Latin47
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A13
27.7%
R6
12.8%
N4
 
8.5%
C4
 
8.5%
E4
 
8.5%
M3
 
6.4%
U2
 
4.3%
S2
 
4.3%
O2
 
4.3%
F2
 
4.3%
Other values (5)5
 
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII47
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A13
27.7%
R6
12.8%
N4
 
8.5%
C4
 
8.5%
E4
 
8.5%
M3
 
6.4%
U2
 
4.3%
S2
 
4.3%
O2
 
4.3%
F2
 
4.3%
Other values (5)5
 
10.6%

DEXAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct140
Distinct (%)61.4%
Missing0
Missing (%)0.0%
Memory size15.0 KiB
2014-02-20
 
4
2014-02-24
 
4
2014-04-02
 
4
2014-05-15
 
4
2014-01-14
 
4
Other values (135)
208 

Length

Max length10
Median length10
Mean length9.947368421
Min length4

Characters and Unicode

Total characters2268
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique78 ?
Unique (%)34.2%

Sample

1st rowNone
2nd row2014-01-03
3rd row2014-01-07
4th row2014-01-07
5th row2014-01-08

Common Values

ValueCountFrequency (%)
2014-02-204
 
1.8%
2014-02-244
 
1.8%
2014-04-024
 
1.8%
2014-05-154
 
1.8%
2014-01-144
 
1.8%
2014-06-134
 
1.8%
2014-02-144
 
1.8%
2014-05-133
 
1.3%
2014-03-103
 
1.3%
2014-04-243
 
1.3%
Other values (130)191
83.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2014-02-144
 
1.8%
2014-05-154
 
1.8%
2014-04-024
 
1.8%
2014-06-134
 
1.8%
2014-02-204
 
1.8%
2014-01-144
 
1.8%
2014-02-244
 
1.8%
2014-02-183
 
1.3%
2014-03-213
 
1.3%
2014-06-233
 
1.3%
Other values (130)191
83.8%

Most occurring characters

ValueCountFrequency (%)
0518
22.8%
-452
19.9%
1398
17.5%
2365
16.1%
4284
12.5%
356
 
2.5%
547
 
2.1%
641
 
1.8%
836
 
1.6%
936
 
1.6%
Other values (5)35
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1808
79.7%
Dash Punctuation452
 
19.9%
Lowercase Letter6
 
0.3%
Uppercase Letter2
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0518
28.7%
1398
22.0%
2365
20.2%
4284
15.7%
356
 
3.1%
547
 
2.6%
641
 
2.3%
836
 
2.0%
936
 
2.0%
727
 
1.5%
Lowercase Letter
ValueCountFrequency (%)
o2
33.3%
n2
33.3%
e2
33.3%
Uppercase Letter
ValueCountFrequency (%)
N2
100.0%
Dash Punctuation
ValueCountFrequency (%)
-452
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2260
99.6%
Latin8
 
0.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0518
22.9%
-452
20.0%
1398
17.6%
2365
16.2%
4284
12.6%
356
 
2.5%
547
 
2.1%
641
 
1.8%
836
 
1.6%
936
 
1.6%
Latin
ValueCountFrequency (%)
N2
25.0%
o2
25.0%
n2
25.0%
e2
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2268
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0518
22.8%
-452
19.9%
1398
17.5%
2365
16.1%
4284
12.5%
356
 
2.5%
547
 
2.1%
641
 
1.8%
836
 
1.6%
936
 
1.6%
Other values (5)35
 
1.5%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
1
169 
4
29 
2
24 
5
 
2
 
2
Other values (2)
 
2

Length

Max length2
Median length1
Mean length0.9956140351
Min length0

Characters and Unicode

Total characters227
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.9%

Sample

1st row
2nd row1
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1169
74.1%
429
 
12.7%
224
 
10.5%
52
 
0.9%
2
 
0.9%
101
 
0.4%
81
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1169
74.8%
429
 
12.8%
224
 
10.6%
52
 
0.9%
101
 
0.4%
81
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1170
74.9%
429
 
12.8%
224
 
10.6%
52
 
0.9%
01
 
0.4%
81
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number227
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1170
74.9%
429
 
12.8%
224
 
10.6%
52
 
0.9%
01
 
0.4%
81
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common227
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1170
74.9%
429
 
12.8%
224
 
10.6%
52
 
0.9%
01
 
0.4%
81
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII227
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1170
74.9%
429
 
12.8%
224
 
10.6%
52
 
0.9%
01
 
0.4%
81
 
0.4%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct41
Distinct (%)89.1%
Missing182
Missing (%)79.8%
Infinite0
Infinite (%)0.0%
Mean27167.80435
Minimum48
Maximum1000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB

Quantile statistics

Minimum48
5-th percentile207
Q1382.5
median759
Q38855
95-th percentile30930
Maximum1000000
Range999952
Interquartile range (IQR)8472.5

Descriptive statistics

Standard deviation146912.4483
Coefficient of variation (CV)5.407593724
Kurtosis45.61468186
Mean27167.80435
Median Absolute Deviation (MAD)533
Skewness6.741148904
Sum1249719
Variance2.158326746 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
5013
 
1.3%
4002
 
0.9%
100002
 
0.9%
3802
 
0.9%
481
 
0.4%
5201
 
0.4%
3601
 
0.4%
8801
 
0.4%
2881
 
0.4%
601
 
0.4%
Other values (31)31
 
13.6%
(Missing)182
79.8%
ValueCountFrequency (%)
481
0.4%
601
0.4%
2061
0.4%
2101
0.4%
2121
0.4%
2401
0.4%
2881
0.4%
3011
0.4%
3601
0.4%
3701
0.4%
ValueCountFrequency (%)
10000001
0.4%
386401
0.4%
314001
0.4%
295201
0.4%
216801
0.4%
215801
0.4%
136441
0.4%
132401
0.4%
100021
0.4%
100002
0.9%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size14.0 KiB
171 
4
24 
3
 
12
2
 
8
5
 
8
Other values (2)
 
5

Length

Max length1
Median length0
Mean length0.25
Min length0

Characters and Unicode

Total characters57
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row3
4th row
5th row

Common Values

ValueCountFrequency (%)
171
75.0%
424
 
10.5%
312
 
5.3%
28
 
3.5%
58
 
3.5%
13
 
1.3%
62
 
0.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
424
42.1%
312
21.1%
28
 
14.0%
58
 
14.0%
13
 
5.3%
62
 
3.5%

Most occurring characters

ValueCountFrequency (%)
424
42.1%
312
21.1%
28
 
14.0%
58
 
14.0%
13
 
5.3%
62
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number57
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
424
42.1%
312
21.1%
28
 
14.0%
58
 
14.0%
13
 
5.3%
62
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
Common57
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
424
42.1%
312
21.1%
28
 
14.0%
58
 
14.0%
13
 
5.3%
62
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII57
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
424
42.1%
312
21.1%
28
 
14.0%
58
 
14.0%
13
 
5.3%
62
 
3.5%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size14.1 KiB
171 
1
24 
99
20 
11
 
5
12
 
3
Other values (5)
 
5

Length

Max length2
Median length0
Mean length0.3728070175
Min length0

Characters and Unicode

Total characters85
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)2.2%

Sample

1st row
2nd row
3rd row99
4th row
5th row

Common Values

ValueCountFrequency (%)
171
75.0%
124
 
10.5%
9920
 
8.8%
115
 
2.2%
123
 
1.3%
51
 
0.4%
41
 
0.4%
21
 
0.4%
91
 
0.4%
31
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
124
42.1%
9920
35.1%
115
 
8.8%
123
 
5.3%
51
 
1.8%
41
 
1.8%
21
 
1.8%
91
 
1.8%
31
 
1.8%

Most occurring characters

ValueCountFrequency (%)
941
48.2%
137
43.5%
24
 
4.7%
31
 
1.2%
51
 
1.2%
41
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number85
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
941
48.2%
137
43.5%
24
 
4.7%
31
 
1.2%
51
 
1.2%
41
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Common85
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
941
48.2%
137
43.5%
24
 
4.7%
31
 
1.2%
51
 
1.2%
41
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII85
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
941
48.2%
137
43.5%
24
 
4.7%
31
 
1.2%
51
 
1.2%
41
 
1.2%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size14.1 KiB
208 
ARTESUNATO + MEFLOQUINA
 
6
ARTESUNATO+MEFLOQUINA
 
5
ARTESUNATO+MEFLOQUINA 100+ 200
 
1
PRIMAQUINA CLOROQUINA
 
1
Other values (7)
 
7

Length

Max length30
Median length0
Mean length1.99122807
Min length0

Characters and Unicode

Total characters454
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)3.9%

Sample

1st row
2nd row
3rd rowARTESUNATO + MEFLOQUINA
4th row
5th row

Common Values

ValueCountFrequency (%)
208
91.2%
ARTESUNATO + MEFLOQUINA6
 
2.6%
ARTESUNATO+MEFLOQUINA5
 
2.2%
ARTESUNATO+MEFLOQUINA 100+ 2001
 
0.4%
PRIMAQUINA CLOROQUINA1
 
0.4%
ARTESUNARO + MEFLOQUINA1
 
0.4%
CLOROQUINA+PRIMAQUINA1
 
0.4%
CLOROQUINA PRIMAQUINA1
 
0.4%
ARTESUNATO MEFLOQUINA1
 
0.4%
ARTESUNATO E MEFLOQUINA1
 
0.4%
Other values (2)2
 
0.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
mefloquina9
20.9%
artesunato8
18.6%
7
16.3%
artesunato+mefloquina6
14.0%
cloroquina2
 
4.7%
primaquina2
 
4.7%
2001
 
2.3%
e1
 
2.3%
150mg1
 
2.3%
pri1
 
2.3%
Other values (5)5
11.6%

Most occurring characters

ValueCountFrequency (%)
A58
12.8%
O43
 
9.5%
U39
 
8.6%
N39
 
8.6%
E33
 
7.3%
T32
 
7.0%
I28
 
6.2%
R25
 
5.5%
23
 
5.1%
Q23
 
5.1%
Other values (14)111
24.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter405
89.2%
Space Separator23
 
5.1%
Math Symbol16
 
3.5%
Decimal Number9
 
2.0%
Other Punctuation1
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A58
14.3%
O43
10.6%
U39
9.6%
N39
9.6%
E33
8.1%
T32
7.9%
I28
6.9%
R25
 
6.2%
Q23
 
5.7%
M20
 
4.9%
Other values (7)65
16.0%
Decimal Number
ValueCountFrequency (%)
05
55.6%
12
 
22.2%
21
 
11.1%
51
 
11.1%
Space Separator
ValueCountFrequency (%)
23
100.0%
Math Symbol
ValueCountFrequency (%)
+16
100.0%
Other Punctuation
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin405
89.2%
Common49
 
10.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A58
14.3%
O43
10.6%
U39
9.6%
N39
9.6%
E33
8.1%
T32
7.9%
I28
6.9%
R25
 
6.2%
Q23
 
5.7%
M20
 
4.9%
Other values (7)65
16.0%
Common
ValueCountFrequency (%)
23
46.9%
+16
32.7%
05
 
10.2%
12
 
4.1%
21
 
2.0%
,1
 
2.0%
51
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII454
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A58
12.8%
O43
 
9.5%
U39
 
8.6%
N39
 
8.6%
E33
 
7.3%
T32
 
7.0%
I28
 
6.2%
R25
 
5.5%
23
 
5.1%
Q23
 
5.1%
Other values (14)111
24.4%

DTRATA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct48
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size14.0 KiB
None
171 
2014-11-24
 
3
2014-02-14
 
2
2014-12-22
 
2
2014-02-24
 
2
Other values (43)
48 

Length

Max length10
Median length4
Mean length5.5
Min length4

Characters and Unicode

Total characters1254
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)16.7%

Sample

1st rowNone
2nd rowNone
3rd row2014-01-07
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None171
75.0%
2014-11-243
 
1.3%
2014-02-142
 
0.9%
2014-12-222
 
0.9%
2014-02-242
 
0.9%
2014-06-232
 
0.9%
2014-05-182
 
0.9%
2014-10-272
 
0.9%
2014-04-042
 
0.9%
2014-02-122
 
0.9%
Other values (38)38
 
16.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none171
75.0%
2014-11-243
 
1.3%
2014-04-042
 
0.9%
2014-06-232
 
0.9%
2014-10-272
 
0.9%
2014-02-122
 
0.9%
2014-05-182
 
0.9%
2014-02-142
 
0.9%
2014-12-222
 
0.9%
2014-02-242
 
0.9%
Other values (38)38
 
16.7%

Most occurring characters

ValueCountFrequency (%)
N171
13.6%
o171
13.6%
n171
13.6%
e171
13.6%
0121
9.6%
-114
9.1%
1107
8.5%
2101
8.1%
477
6.1%
712
 
1.0%
Other values (5)38
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter513
40.9%
Decimal Number456
36.4%
Uppercase Letter171
 
13.6%
Dash Punctuation114
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0121
26.5%
1107
23.5%
2101
22.1%
477
16.9%
712
 
2.6%
99
 
2.0%
38
 
1.8%
68
 
1.8%
57
 
1.5%
86
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
o171
33.3%
n171
33.3%
e171
33.3%
Uppercase Letter
ValueCountFrequency (%)
N171
100.0%
Dash Punctuation
ValueCountFrequency (%)
-114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin684
54.5%
Common570
45.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0121
21.2%
-114
20.0%
1107
18.8%
2101
17.7%
477
13.5%
712
 
2.1%
99
 
1.6%
38
 
1.4%
68
 
1.4%
57
 
1.2%
Latin
ValueCountFrequency (%)
N171
25.0%
o171
25.0%
n171
25.0%
e171
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N171
13.6%
o171
13.6%
n171
13.6%
e171
13.6%
0121
9.6%
-114
9.1%
1107
8.5%
2101
8.1%
477
6.1%
712
 
1.0%
Other values (5)38
 
3.0%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing228
Missing (%)100.0%
Memory size1.9 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542014-01-0320140120143333060022937492013-12-252013521983-08-284030M6109333300951NaT0NoneNaNNoneNaT
12B542014-01-0320140120143333045522807952013-12-272013521998-10-074015F5407333304551NaT232102014-01-031NaNNoneNaT
22B542014-01-0420140120143333045527084342014-01-022014011955-09-194058M6108333304551NaT214205111212312014-01-072490.0399ARTESUNATO + MEFLOQUINA2014-01-07NaT
32B542014-01-0720140220143333045554763212014-01-062014021975-07-294038F5108333304551NaT2112102014-01-071NaNNoneNaT
42B542014-01-0820140220143333045554763212014-01-062014021988-08-154025M61333304551NaT2112102014-01-081NaNNoneNaT
52B542014-01-0920140220143333045554763212014-01-092014021972-05-264041F5108333304551NaT242102014-01-091NaNNoneNaT
62B542014-01-0920140220143333020038103482013-12-152013511987-05-014026M6404333302001NaT7821152111202014-01-101NaNNoneNaT
72B542014-01-1020140220143333045522883382014-01-062014022012-09-264001F6110333304901NaT2101102014-01-101NaNNoneNaT
82B542014-01-1020140220143333045522883382014-01-062014021981-11-074032M6108333304901NaT2101102014-01-101NaNNoneNaT
92B542014-01-1220140320143333045554763212014-01-112014021981-08-264032M6107333304551NaT2112102014-01-121NaNNoneNaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
2182B542014-12-1120145020143333045554763212014-12-102014501979-07-284035M6108333304551NaT2521052112102014-12-111NaNNoneNaT
2192B542014-12-1620145120143333045522883382014-12-152014511969-02-104045M6908333304551NaT2631102101102014-12-161NaNNoneNaT
2202B542014-12-1620145120143333045522883382014-12-142014511983-01-284031M6908333304551NaT2631102101102014-12-161NaNNoneNaT
2212B542014-12-1620145120143333045522883382014-12-142014511966-10-144048M6908333304551NaT2631102101102014-12-161NaNNoneNaT
2222B542014-12-1720145120143333045554763212014-12-142014511951-08-154063M6108333304551NaT9999932112102014-12-171NaNNoneNaT
2232B542014-12-2220145220143333045530034502014-12-182014511977-08-284037M6406333304551NaT51012511121231LUAN2014-12-22210002.05112014-12-22NaT
2242B542014-12-2220145220143333045522883382014-12-172014511996-03-204018F5105333303501NaT999991110112AM11303802014-12-2246080.0412014-12-22NaT
2252B542014-12-2720145220143333045522702342014-12-222014521982-08-214032M6407333304551NaT4142151102121772014-12-27210000.0499ARTESUNATO + MEFLOQUINA2014-12-27NaT
2262B542014-12-2820145320143333045530229352014-12-262014521962-04-234052M6208333304551NaT1414101112121772014-12-282684.04112014-12-28NaT
2272B542014-12-2920145320143333045554763212014-12-262014521960-09-014054M6406333304201NaT1424102112102014-12-291NaNNoneNaT